%------------------------------------------------------------------------------
% Copyright (c) 1991-2014, Xavier Leroy and Didier Remy.  
%
% All rights reserved. Distributed under a creative commons
% attribution-non-commercial-share alike 2.0 France license.
% http://creativecommons.org/licenses/by-nc-sa/2.0/fr/
%
% Translation by Eliot Handelman (eliot@colba.net)
%------------------------------------------------------------------------------

\chapter{Generalities}
\cutname{generalities.html}

\section{Modules {\normalfont\texttt{Sys}} and {\normalfont\texttt{Unix}}}

Functions that give access to the system from {\ocaml} are grouped into two
modules. The first module,  \libmodule{Sys}, contains those functions
common to Unix and other operating systems under which {\ocaml} runs.
The second module, \libmodule{Unix}, contains everything specific to
Unix. 

In what follows, we will refer to identifiers from the \ml+Sys+ and
\ml+Unix+ modules without specifying which modules they come from.  That is, we
will suppose that we are within the scope of the directives 
\ml+open Sys+ and \ml+open Unix+. In complete examples, we explicitly write
\ml+open+, in order to be truly complete.

The \ml+Sys+ and \ml+Unix+ modules can redefine certain
identifiers of the \ml+Pervasives+ module, hiding previous
definitions. For example,  \ml+Pervasives.stdin+  is different from 
\ml+Unix.stdin+. The previous definitions can always be obtained
through a prefix.

To compile an {\ocaml} program that uses the 
Unix library, do this:
%
\begin{lstlisting}
ocamlc -o prog unix.cma mod1.ml mod2.ml mod3.ml 
\end{lstlisting}
%
where the program \ml+prog+ is assumed to comprise of the three modules \ml+mod1+,
\ml+mod2+ and \ml+mod3+. The modules can also be compiled separately:
%
\begin{lstlisting}
ocamlc -c mod1.ml
ocamlc -c mod2.ml
ocamlc -c mod3.ml
\end{lstlisting}
%
and linked with:
%
\begin{lstlisting}
ocamlc -o prog unix.cma mod1.cmo mod2.cmo mod3.cmo
\end{lstlisting}
%
In both cases, the argument \ml+unix.cma+ is the \ml+Unix+ library
written in {\ocaml}. To use the native-code compiler rather than the
bytecode compiler, replace \ml+ocamlc+ with \ml+ocamlopt+ and
\ml+unix.cma+ with \ml+unix.cmxa+.

If the compilation tool  \ml+ocamlbuild+ is used, simply add the
following line to the 
\ml+_tags+ file:
%
\begin{lstlisting}
<prog.{native,byte}> : use_unix
\end{lstlisting}
% 
The Unix system can also be accessed from the interactive system,
also known as the \quotes{toplevel}. If your platform supports dynamic
linking of C libraries, start an \ml+ocaml+ toplevel and type in the
directive:
%
\begin{lstlisting}
#load "unix.cma";;
\end{lstlisting}
%
Otherwise, you will need to create an interactive system containing
the pre-loaded system functions:
%
\begin{lstlisting}
ocamlmktop -o ocamlunix unix.cma
\end{lstlisting}
%
This toplevel can be started by:
\begin{lstlisting}
./ocamlunix
\end{lstlisting}

\section{Interface with the calling program}

When running a program from a shell (command interpreter), the shell
passes \emph{arguments} and an \emph{environment} to the program.  The
arguments are words on the command line that follow the name of the
command. The environment is a set of strings of the form
\texttt{variable=value}, representing the global bindings of environment
variables: bindings set with \texttt{setenv var=val} for the
\texttt{csh} shell, or with \texttt{var=val; export var} for
the \texttt{sh} shell.

The arguments passed to the program are in the string array
\ml+Sys.argv+:
%
\begin{listingcodefile}{tmpsys.mli}
val $\indexlibvalue{Sys}{argv}$ : string array
\end{listingcodefile}
%
The environment of the program is obtained by the function
\ml+Unix.environment+:
%
\begin{listingcodefile}{tmpunix.mli}
val $\indexlibvalue{Unix}{environment}$ : unit -> string array
\end{listingcodefile}
%
A more convenient way of looking up the environment is to use the
function \ml+Sys.getenv+:
%
\begin{listingcodefile}{tmpsys.mli}
val $\indexlibvalue{Sys}{getenv}$ : string -> string
\end{listingcodefile}
%
\ml+Sys.getenv v+ returns the value associated with the variable name
\ml+v+ in 
the environment, raising the exception  \ml+Not_found+ if this 
variable is not bound.
%
\begin{example}
As a first example, here is the \ml+echo+ program, which prints a
list of its arguments, as does the Unix command of the same name.
\begin{listingcodefile}{echo.ml}
let echo () = 
  let len = Array.length Sys.argv in
  if len > 1 then 
    begin
      print_string Sys.argv.(1); 
      for i = 2 to len - 1 do 
        print_char ' ';
        print_string Sys.argv.(i); 
      done;
      print_newline ();
    end;;
echo ();;
\end{listingcodefile}
\end{example}

A program can be terminated at any point with a call to \ml+exit+:
%
\begin{listingcodefile}{tmppervasives.mli}
val $\indexlibvalue{Pervasives}{exit}$ : int -> 'a
\end{listingcodefile}
%
The argument is the return code to send back to the calling program. The
convention is to return 0 if all has gone well, and to return a
non-zero code to signal an error. In conditional constructions, the
\ml+sh+ shell interprets the return code 0 as the boolean
\quotes{true}, and all non-zero codes as the boolean \quotes{false}.
%
When a program terminates normally after executing all of the
expressions of which it is composed, it makes an implicit call to
\ml+exit 0+. When a program terminates prematurely because an
exception was raised but not caught, it makes an implicit call to
\ml+exit 2+.
%
The function \ml+exit+ always flushes the buffers of all channels open for
writing. The function \ml+at_exit+ lets one register other actions
to be carried out when the program terminates.
%
\begin{listingcodefile}{tmppervasives.mli}
val $\indexlibvalue{Pervasives}{at\_exit}$ : (unit -> unit) -> unit
\end{listingcodefile}
%
The last function to be registered is called first. A function registered with
\ml+at_exit+ cannot be unregistered. However, this is not a
real restriction: we can easily get the same effect with a function
whose execution depends on a global variable.

\section{Error handling}

Unless otherwise indicated, all functions in the \ml+Unix+ module
raise the exception \ml+Unix_error+ in case of error.
%
\begin{codefile}{tmpunix.mli}
type error = Unix.error
\end{codefile}
%
\begin{listingcodefile}{tmpunix.mli}
exception $\libexn{Unix}{Unix\_error}$ of error * string * string
\end{listingcodefile}
%
The second argument of the \ml+Unix_error+ exception is the name of
the system call that raised the error. The third argument identifies,
if possible, the object on which the error occurred; for example, in
the case of a system call taking a file name as an argument, this file name will be
in the third position in \ml+Unix_error+. Finally, the first argument
of the exception is an error code indicating the nature of the
error. It belongs to the variant type \ml+error+:
%
\begin{lstlisting}
type $\libtype{Unix}{error}$ = E2BIG | EACCES | EAGAIN | ...  | EUNKNOWNERR of int
\end{lstlisting}
%
Constructors of this type have the same names and meanings as those
used in the \textsc{posix} convention and  certain errors from
\textsc{unix98} and \textsc{bsd}. All other errors use the constructor \ml+EUNKOWNERR+.

Given the semantics of exceptions, an error that is not specifically
foreseen and intercepted by a \ml+try+ propagates up to the top of a
program and causes it to terminate prematurely.  In small
applications, treating unforeseen errors as fatal is a good practice.
However, it is appropriate to display the error clearly. To do this,
the \ml+Unix+ module supplies the \ml+handle_unix_error+ function:
%
\begin{listingcodefile}{tmpunix.mli}
val $\indexlibvalue{Unix}{handle\_unix\_error}$ : ('a -> 'b) -> 'a -> 'b
\end{listingcodefile}
%
The call  \ml+handle_unix_error f x+ applies function  \ml+f+ to the
argument \ml+x+. If this raises the exception \ml+Unix_error+, a
message is displayed describing the error, and the program is
terminated with  \ml+exit 2+. A typical use is
%
\begin{lstlisting}
handle_unix_error prog ();;
\end{lstlisting}
% 
where the function \ml+prog : unit -> unit+ executes the body of the
program. For reference, here is how \ml+handle_unix_error+ is
implemented.
%
\begin{listingcodefile}[style=numbers]{handle_unix_error.ml}
open Unix;;
let handle_unix_error f arg =
  try
    f arg
  with Unix_error(err, fun_name, arg) ->
    prerr_string Sys.argv.(0); $\label{prog:argv}$
    prerr_string ": \"";
    prerr_string fun_name;
    prerr_string "\" failed";
    if String.length arg > 0 then begin
      prerr_string " on \"";
      prerr_string arg;
      prerr_string "\""
    end;
    prerr_string ": ";
    prerr_endline (error_message err); $\label{prog:errmsg}$
    exit 2;;
\end{listingcodefile}
% 
Functions of the form \ml+prerr_xxx+ are like the functions
\ml+print_xxx+, except that they write on the error channel
\ml+stderr+ rather than on the standard output channel \ml+stdout+.

The primitive \indexlibvalue{Unix}{error\_message}, of type 
\ml+error -> string+, returns a message describing the error given as an
argument (line~\ref{prog:errmsg}). The argument number zero of the
program, namely \ml+Sys.argv.(0)+, contains the name of the command
that was used to invoke the program (line~\ref{prog:argv}).

The function \ml+handle_unix_error+ handles fatal errors, \ie{} errors
that stop the program.  An advantage of {\ocaml} is that it requires
all errors to be handled, if only at the highest level by
halting the program. Indeed, any error in a system call raises an
exception, and the execution thread in progress is interrupted up to
the level where the exception is explicitly caught and handled. This avoids
continuing the program in an inconsistent state.

Errors of type \ml+Unix_error+  can, of course, be
selectively matched. We will often see the following
function later on:
%
\begin{lstlisting}
let rec restart_on_EINTR f x = 
  try f x with Unix_error (EINTR, _, _) -> restart_on_EINTR f x 
\end{lstlisting}
%
which is used to execute a function and to restart it automatically
when it executes a system call that is interrupted (see section~\ref{sec/sigsyscalls}).

\section{Library functions}

As we will see throughout the examples, system programming often
repeats the same patterns. To reduce the code of each application to
its essentials, we will want to define library functions that
factor out the common parts.

Whereas in a complete program one knows precisely which errors can be
raised (and these are often fatal, resulting in the program being stopped),
we generally do not know the execution context in the case of library functions. We
cannot suppose that all errors are fatal. It is therefore necessary to
let the error return to the caller, which will decide on a suitable
course of action (\eg{} stop the program, or handle or ignore the error). However,
the library function in general will not allow the error to simply pass
through, since it must maintain the system in a consistent state. For
example, a library function that opens a file and then applies an
operation to its file descriptor must take care to close the
descriptor in all cases, including those where the processing of the
file causes an error. This is in order to avoid a file descriptor
leak, leading to the exhaustion of file descriptors.

Furthermore, the operation applied to a file may be defined by a
function that was received as an argument, and we don't know precisely
when or how it can fail (but the caller in general will know). We are
thus often led to protect the body of the processing with
\quotes{finalization} code, which must be executed just before the
function returns, whether normally or exceptionally.

There is no built-in finalize construct \ml+try+ \ldots \ml+finalize+ in
the {\ocaml} language, but it can be easily defined\footnote{A
  built-in construct would not be less useful.}:
\begin{codefile}{misc.mli}
(** miscellaneous functions for the Unix library *)

open Sys
open Unix

(** {6 Finalization} *)

val try_finalize : ('a -> 'b) -> 'a -> ('c -> unit) -> 'c -> 'b
(** [try_finalize f x g y] applies the main code [f] to [x] and
   returns the result after having applied the finalization 
   code [g] to [y]. If the main code raises the exception
   [exn], the finalization code is executed and [exn] is raised.
   If the finalization code itself fails, the exception
   returned is always the one from the finalization code. *)
\end{codefile}
%
\begin{listingcodefile}{misc.ml}
let try_finalize f x finally y =
  let res = try f x with exn -> finally y; raise exn in 
  finally y; 
  res
\end{listingcodefile}
%
This function takes the main body \ml+f+ and the finalizer
\ml+finally+, each in the form of a function, and two parameters \ml+x+
and \ml+y+, which are passed to their respective functions. The body
of the program \ml+f x+ is executed first, and its result is kept
aside to be returned after the execution of the finalizer 
\ml+finally+. In case the program fails, \ie{} raises an exception \ml+exn+,
the finalizer is run and the exception \ml+exn+ is raised
again. If both the main function and the finalizer fail, the
finalizer's exception is raised (one could choose to have the main
function's exception raised instead).

\paragraph{Note}

In the rest of this course, we use an auxiliary library \ml+Misc+
which contains several useful functions like \ml+try_finalize+ that are often
used in the examples. We will introduce them as they are needed. To
compile the examples of the course, the definitions of the \ml+Misc+
module need to be collected and compiled.

The \ml+Misc+ module also contains certain functions, added for
illustration purposes, that will not be used in the course. These
simply enrich the \ml+Unix+ library, sometimes by redefining the
behavior of certain functions.  The \ml+Misc+ module must thus take
precedence over the \ml+Unix+ module.

\paragraph{Examples}

The course provides numerous examples. They can be compiled with
{\ocaml}, version {\ocamlversion}$\!\!\!$.  %% To get rid of that space !
Some programs will have to be slightly modified in order to work with 
older versions.

There are two kinds of examples: \quotes{library functions} (very
general functions that can be reused) and small applications. It is
important to distinguish between the two. In the case of library functions, we
want their context of use to be as general as possible. We will thus
carefully specify their interface and attentively treat all
particular cases. In the case of small applications, an error is often
fatal and causes the program to stop executing. It is sufficient to report
the cause of an error, without needing to return to a consistent state, since
the program is stopped immediately thereafter.
